BBN: description of the PLUM system as used for MUC-5
نویسندگان
چکیده
APPROACH Traditional approaches to the problem of extracting data from texts have emphasized hand-crafted linguisti c knowledge. In contrast, BBN's PLUM system (Probabilistic Language Understanding Model) was developed as par t of an ARPA-funded research effort on integrating probabilistic language models with more traditional linguisti c techniques. Our research and development goals are : • more rapid development of new applications , • the ability to train (and retrain) systems based on user markings of correct and incorrect output , • more accurate selection among interpretations when more than one is found, an d • more robust partial interpretation when no complete interpretation can be found. We began this research agenda approximately three years ago. During the past two years, we have evaluated muc h of our effort in porting our data extraction system (PLUM) to a new language (Japanese) and to two new domains. Three key design features distinguish PLUM : statistical language modeling, learning algorithms and partia l understanding. The first key feature is the use of statistical modeling to guide processing. For the version of PLUM used in MUC-5, part of speech information was determined by using well-known Markov modeling technique s embodied in BBN's part-of-speech tagger POST [5]. We also used a correction model, AMED [3], for improvin g Japanese segmentation and part-of-speech tags assigned by JUMAN. For the microelectronics domain, we used a probabilistic model to help identify the role of a company in a capability (whether it is a developer, user, etc .). Statistical modeling in PLUM contributes to portability, robustness, and trainability. The second key feature is our use of learning algorithms both to obtain the knowledge bases used by PLUM' s processing modules and to train the probabilistic algorithms. We feel the key to portability of a data extractio n system is automating the acquisition of the knowledge bases that need to change for a particular language o r application. For the MUC-5 applications we used learning algorithms to train POST, AMED, and the template-filler model mentioned above. We also used a statistical learning algorithm to learn case frames for verbs fro m examples (the algorithm and empirical results are in [4]). A third key feture is partial understanding, by which we mean that all components of PLUM are designed t o operate on partially interpretable input, taking advantage of information when available, and not failing whe n information is unavailable. Neither …
منابع مشابه
BBN: description of the PLUM system as used for MUC-3
Traditional approaches to the problem of extracting data from texts have emphasized handcrafted linguisti c knowledge. In contrast, BBN's PLUM system (Probabilistic Language Understanding Model) was developed as part of a DARPA-funded research effort on integrating probabilistic language models with more traditiona l linguistic techniques . Our research and development goals are • more rapid de...
متن کاملBBN: description of the PLUM system as used for MUC-4
Traditional approaches to the problem of extracting data from texts have emphasized hand-crafted linguisti c knowledge . In contrast, BBN's PLUM system (Probabilistic Language Understanding Model) was developed a s part of a DARPA-funded research effort on integrating probabilistic language models with more traditional linguistic techniques . Our research and development goals are • more rapid ...
متن کاملBBN PLUM: MUC-3 test results and analysis
Perhaps the most important facts about our participation in MUC-3 reflect our starting point and goals . In March, 1990, we initiated a pilot study on the feasibility and impact of applying statistical algorithms in natura l language processing. The experiments were concluded in March, 1991 and lead us to believe that statistica l approaches can effectively improve knowledge-based approaches [W...
متن کاملBBN: Description of the SIFT System as Used for MUC-7
For MUC-7, BBN has for the first time fielded a fully-trained system for NE, TE, and TR; results are all the output of statistical language models trained on annotated data, rather than programs executing handwritten rules. Such trained systems have some significant advantages: • They can be easily ported to new domains by simply annotating data with semantic answers. • The complex interactions...
متن کاملBBN's PLUM Probabilistic Language Understanding System
Three key design features distinguish PLUM from other approaches: statistical language modeling, learning algorithms and partial understanding. The first key feature is the use of statistical modeling to guide processing. For the version of PLUM used in MUC-5, part of speech information was determined by using well-known Markov modeling techniques embodied in BBN's part-of-speech tagger POST [5...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993